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Abstract. Bayesian solution of an inverse problem for indirect measurement M = 
AU + S is considered, where U is a function on a domain ofW^. Here A is a smooth- 
ing linear operator and £ is Gaussian white noise. The data is a realization mk of the 
random variable Mk = PkAU + Pk£ , where Pk is a linear, finite dimensional opera- 
tor related to measurement device. To allow computerized inversion, the unknown is 
discretized as Un = TnU , where T„ is a finite dimensional projection, leading to the 
computational measurement model Mkn = PkAUn + Pk£- Bayes formula gives then 
the posterior distribution 'Ukniun \ mkn) ~ n„(M„) exp(— |||mfc„ — PfcAM^III) in W'- , 
and the mean xikn '■= J UnT^kn{un \ rrik) dun is considered as the reconstruction ofU. 
We discuss a systematic way of choosing prior distributions n„ for all n > uq > by 
achieving them as projections of a distribution in a infinite- dimensional limit case. 
Such choice of prior distributions is discretization-invariant in the sense that H„ 
represent the same a priori information for all n and that the mean Ufc„ converges 
to a limit estimate as k,n —* oo. Gaussian smoothness priors and wavelet-based 
Besov space priors are shown to be discretization invariant. In particular, Bayesian 
inversion in dimension two with Bl^ prior is related to penalizing the (} norm of the 
wavelet coefficients ofU. 

Keywords. Inverse problem, statistical inversion, Bayesian inversion, discretiza- 
tion invariance, reconstruction, wavelet, Besov space 
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1. Introduction 
Consider a quantity U that can be observed indirectly through a relation 

(1) M = AU + S, 

where A is a smoothing linear operator and S is white noise. Here U and M are con- 
sidered as continuum objects, or functions defined on subsets of M*^, so that our dis- 
cussion applies to classical models of mathematical physics such as Laplace, Maxwell, 
Helmholtz or Schrodinger equation. We are interested in the use of Bayesian in- 
version to find information about U from measurement data concerning M. Let 
U{x,uj), M{y,uj) and £{y,u) be random functions where a; e f2 is an element of a 
complete probability space S,P) and x and y denote the variables in domains 
of Euclidean spaces. We analyze Bayesian estimates of U when a continuum model 
of the form ([T]) is approximated by finite- dimensional models to allow computerized 
inversion. 

Assume that a measurement device provides us with a realization of the random 
variable 

(2) Mk = PkM = AkU + Sk, 

where = P^A and Sk = Pk^- Here Pk is a linear operator related to the device; for 
simplicity we assume that P^ is an orthogonal projection with /c-dimensional range. 
We call ([2D the practical measurement model and ([T]) the continuum model. Realiza- 
tions of measurements are denoted by = Mk{uJo) and m = M{ujo), respectively, 
where tuo G f2 is a specific element in the probability space. 
This study concentrates on the inverse problem 

(3) given a realization Mk{uJo), estimate U, 

where the estimates in question are means and confidence intervals related to a 
Bayesian posterior probability distribution. 

For example, consider the brain imaging method called magnetoencephalogra- 
phy (meg), see e.g. [23]. Maxwell's equations describe how synchronized neuronal 
currents U in the cerebral cortex produce a magnetic field AU that can be mea- 
sured at the surface of the head. Let S denote the magnetic fields produced by all 
external sources; then the continuum model ([1]) describes the total magnetic field 
M = AU + £. In practice one measures the inner products (M, j = 1,2, . . . , k, 
where (pj are linearly independent device functions corresponding to measuring the 
flux of the magnetic field M through a small surface determined by the j'th mea- 
surement unit (squid). As an idealization, let us assume that (pj are orthonormal so 
that PkV = is an orthogonal projection. Then meg data is modelled 

by PkM. 

Computational solution of ([3]) using Bayesian inversion involves discretization of 
the unknown quantity U . We assume that U is a priori known to take values in a 
function space Y . Choose a linear projection :Y with n-dimensional range 
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Yn, and define a random variable Un '■= TrJJ taking values in y„. This leads to the 
computational model 

(4) Mfc„ = AkUn + Sk 

involving two independent discretizations: Pk is related to the measurement device 
and T„ to finite representation of the unknown. 

In the above application related to meg, the projection T„ corresponds to an 
approximate representation of the electromagnetic sources in the brain using a finite 
set of basis functions (defined for instance according to a finite element method). 

Note that the model (jlj) is virtual in the sense that f/„ appears neither in the 
continuum model ([T]) nor in the practical measurement model (Ej). In particular, 
measurement Mkn^oJo) is related to the computational model but not to the practical 
measurement model. This is why we use ruk = Mk{uJo) as the given data. 

Denote the probability density function of the random variable Mkn by Tfcn(mfc„). 
The posterior density for Un is given by the Bayes formula: 



(5) TTkniUnlmkn) 



Un{un) exp(-i||mfc„ - AkUnWl) 



^knijlilkn) 

where the exponential function corresponds to (HI) with white noise statistics with 
identity variance, and a priori information about U is expressed in the form of a 
prior density n„ for the random variable Un- The density n„ assigns high probability 
to functions that are typical in light of a priori information, and low probability to 
atypical functions. 

We can now state the inverse problem ([3]) more specifically: 

(6) given a realization = Mk{uJo), estimate U by u^n, 

where the conditional mean (cm) estimate (or posterior mean estimate) Ukn is 



(7) Ukn ■= / Un TTkniUn \ rUk) dUn- 

Jy-a 

Note that formula ([7]) differs from the conventional definition of posterior mean 
estimate since it involves = M^ioj^s) instead of m^n = M^niuJo). 

The estimate u^n and confidence intervals for it are typically computed approx- 
imately with Markov chain Monte Carlo (mcmc) methods involving simulation 
software for the finite model (jl]). The solution strategy has been applied to 
image restoration [7], geological prospecting [SU\ \^, atmospheric and ionospheric 
remote sensing [21 [221 E21 SEl EO], medical X-ray tomography [201 EHl E] and elec- 
trical impedance imaging [HTl E]. For general reference on Bayesian inversion see 

[S31I1H112Z1E311I3]. 

We remark that applying the MCMC solution strategy requires also practical im- 
plementation of the operator Ak- In the case of meg imaging Ak corresponds to 
computing the electromagnetic field AkUn using the discrete current [/„ and an 
approximate numerical solution to Maxwell's equations. We neglect the effects of 
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numerical error in the implementation of Ak in this paper. (One possibility to take 
this error into account is to use the approximation error model of Kaipio and Som- 
ersalo [331, Section 5.8], but this leads to non-Gaussian noise statistics in general and 
thus falls outside the scope of this discussion.) 

Summarizing, our starting point is the infinite-dimensional continuum model ([T]). 
A measurement instrument provides us with finite-dimensional noisy data rrik = 
Mk{uJo) described by the practical measurement model ([2]). Our aim is to use 
to find information about the unknown U. To allow computerized inversion we 
construct the fully discrete computational model (jll) involving a priori information 
about U, and we write down the posterior distribution ([5]). Finally, we use formula 
(jTj) and numerical methods to estimate U with Ufe„. However, in definition the 
data rrik comes from the practical measurement model ([2]), while TTkn is related to 
the computational model (jll). Taking this incompatibility into account is one of the 
central novelties in this work. 

Constructing T„ and n„ is the core difficulty in Bayesian inversion. Often there 
is no natural discretization for the continuum quantity U, so n can be freely chosen. 
Consequently, T„ and should in principle be described for all n > 0, or at least 
for an infinite sequence of increasing values of n. Also, updating our measurement 
device may increase k independently of n. This work is motivated by the need to 
avoid the following unwanted phenomena: 

(a) The estimates Ukn diverge as n ^ oo. In this case investing more com- 
putational resources to modeling our unknown does not necessarily result in 
improved reconstructions. 

(b) The estimates Ukn diverge as oo. In this case performing more 
measurements may lead to worse reconstructions. 

(c) Representation of a priori knowledge is incompatible with dis- 
cretization. It is reported in [39] that discrete (non-Gaussian) total vari- 
ation priors converge to a Gaussian smoothness prior as n ^ oo. In this 
case one makes the mistake of specifying different a priori information for 
different values of n. See Appendix iBl below for more details. 

A choice of T„ and II^ is called discretization-invariant if it avoids (a)-(c). 

We construct prior distributions for U in the infinite-dimensional space Y. Then 
the random variable f/„ = Tnll takes values in the finite-dimensional subspace Yn C 
Y and represents approximately the same a priori knowledge as U. Further, we 
analyze convergence of u^n using a deterministic function called reconstructor that 
almost surely maps a given measurement to the conditional mean estimate. For 
example, the reconstructor 7lMkn{Un\ ■ ) corresponding to the computational model 
(jl]) takes the measurement data rrik = Mk{uo) to the mean Ukn defined in ((71). The 
infinite-dimensional model M = AU + E has a reconstructor 7?,m(^| ■ ) as well. 
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Theorem [7] below states under suitable assumptions on U, T„ and Pk that 

(8) lim TZMkAUnlrrik) =TZM{U\m), 

where the realization = Pkm comes from a realization m = M{ujq) of the random 
variable M. 

Our proving strategy involves another measurement model analogous to (jlj): 

(9) Qkn = AkUn + 

where the noise is not finite-dimensional. The noise models ([I]) and ([9]) now contain 
the same (continuum) white noise process. This allows us to prove convergence 
results in a fixed function space. Theorem [5] below states under suitable assumptions 
on U, Tn and P^ that if lim^^oo "^fc = ^ then 

lim ne^^{Un\mk) ='RM{U\m). 

n,k—fOO 

Formula ([H]) follows by showing that the reconstructors coincide: TZskniUnl^k) = 
7lMkn{Un\mk) for G Ran (Pk)- We will consider also more general reconstructors 
'J^M{g{U)\m) that can be used to analyze convergence of confidence intervals. 

Our way of using infinite-dimensional limit processes is one possibility to achieve 
discretization-invariance since we avoid problems (a)-(c). 

We are especially interested in discretization-invariant and edge-preserving Bayesian 
inversion. Total variation regularization of Rudin, Osher and Fatemi [57] penalizes 
the norm of derivatives and yields edge-preserving reconstructions in practice 
[201 [HH [3ll [601 EH]- These results are equivalent to computing maximum a poste- 
riori estimates using a total variation prior and a Gaussian likelihood. However, 
Bayesian inversion with discretized total variation prior distribution, 

(10) nc7„(u) = c„exp(-a„||VM||Li(o,i)), m e F„, 

where ¥„, C 1Vq^'^(0, 1) are finite dimensional subspaces, is not discretization-invariant 
[5U] : in particular, conditional mean estimates (i.e. posterior mean estimates) lose 
their edge-preserving quality as n — cx). 

Wavelet-based Bayesian inversion using Besov space priors is used in applications 
with results similar to total variation regularization, see [HI EB El]- The Besov 
space Bl^ that bounds norms of (band-filtered) first derivatives similarly to ffTUl) 
is useful for image processing, see Meyer [37]. One of the main results of this paper 
is that Besov priors are discretization- invariant; we emphasize that this gives us 
non-Gaussian discretization-invariant priors. The proof is based on the analysis of 
the infinite-dimensional limit case and includes a quantitative estimate on the speed 
of convergence of reconstructors. Further, we show in Section 12.21 for the special 
case of two-dimensional deblurring that Bl^ inversion using Ufc„ reduces to applying 
£^-type prior on the wavelet coefficients of [/„ — a combination of two well-known 
computational methods. See Section 12.21 for more details. There is an interesting 
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parallel to algorithms computing MAP estimates with penalty on the norm of 
Fourier or wavelet coefficients [2T1 [611 [T71 H2] . 

Let us review previous literature on the topic. The study of Bayesian inver- 
sion in infinite-dimensional function spaces was initiated by Franklin ^29j and con- 
tinued by Mandelbaum [15], Lehtinen, Paivarinta and Somersalo [lOj, Fitzpatrick 
[28] . and Luschgy [H]. The concept of discretization invariance was formulated by 
Markku Lehtinen in the 1990's and has been studied by Lasanen [37], Piiroinen 
[53] . D'Ambrogi, Maenpaa and Markkanen [3], and Lasanen and Roininen [38]. A 
definition of discretization invariance similar to the above was given in [39]. For 
other kinds of discretization of continuum objects in the Bayesian framework, see 
[niEO]. The use of wavelets and Besov spaces in statistical algorithms is discussed 
in [H El [HI [HI [in [19] . For regularization based approaches for statistical inverse 
problems, see [25l [261 El [M]- The relationship between continuous and discrete 
(non-statistical) inversion is studied in Hilbert spaces in ^J. See [llj for special- 
ized discretizations for inverse problems. 

We remark that working entirely within the computational model (jl]) would re- 
quire using a realization Mfc„(u;o) of the random variable M^^ instead of = 
Mfc(a;o) in formula ([7]). The starting point of inversion in earlier studies of discretiza- 
tion invariance is indeed the random variable M^^ or its realization. However, the 
appropriate model of data given by the measurement device is a realization re- 
lated to model Rigorous analysis of this incompatibility is a central novelty in 
this paper. To emphasize this aspect we show that inversion from using Gauss- 
ian smoothness priors or Besov space priors is discretization- invariant. We do not 
discuss non-Gaussian noise models in this paper. 

This paper is organized as follows. In Section [2] we discuss discretization- invariant 
Bayesian inversion using Gaussian and Besov priors. In Section [SI we present the 
general theory of discretization invariance. In Section [H we discuss random variables 
with values in Besov spaces. After proving some technical results in Section [31 we 
apply them in Section [HI to prove convergence of reconstructors arising from Besov 
priors. In Appendix IHl we consider examples. 

Below (■, ■) refers to pairing of either generalized functions with test functions, or a 
Banach space with its dual. We denote by (■, ■)x the inner product in a Hilbert space 
X. The notation L{X, Y) stands for the space of bounded linear operators between 
Banach spaces X and Y, and L{X,X) is abbreviated as L{X). We occasionally 
denote the norm || ■ ||L(x,y) by || • The specific element cuq ^ ^ of the 

probability space denotes realizations of measurements throughout the paper. 
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comments helped us to improve the paper. ML and SS were supported by National 
Technology Agency of Finland (TEKES) under Contract 206/03, Finnish Centre of 
Excellence in Inverse Problems Research and PaloDEx Group. ES was supported 
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2. Example: discretization-invariant deblurring 

In this section we give examples of discretization-invariant prior distributions and 
consider a simple inverse problem to give a flavour of our results to the reader. The 
precise definitions in a more general setting are postponed to the later sections. 

We discuss Bayesian deconvolution using a Gaussian smoothness prior and a Besov 
space prior. In both cases we define the prior distribution in the continuous con- 
text and then marginalize it to discrete cases to allow practical computation. The 
Gaussian case is shown to be related to deterministic Tikhonov regularization with 
derivative penalty. 

For simplicity, we ignore boundary effects by considering periodic functions. The 
loss of generality is not too bad from the practical point of view since the periodic 
analysis covers compactly supported non-periodic cases. 

Let be the two-dimensional torus constructed by identifying parallel sides of 
the square D = (0, 1)^ C M^; we model periodic images as elements of function 
spaces over T^. The continuum model is M = AU + £ with convolution operator A 
defined by 

(11) Au{x)= / <^{x~y)u{y)dy, 
where $ G C°°(T^) is a point spread function. 

2.1. Gaussian smoothness prior. For any s G M, let H^iT"^) be the L^-based 
Sobolev space equipped with Hilbert space inner product 

(12) (0,V^)^.(Tr.)= / {{I - /\yl'<p){x){{I - /\r"^){x)dx. 

Note that H^iT"^) = L\T^). 

Recall that a generalized Gaussian random variable V takes values in the space 
of generalized functions, and the pairing {V,(f)) with any test function (j) e C°°(T^) 
is a Gaussian random variable taking values in M, see [SS]. The Gaussian random 
variables we will consider below are assumed to take values in some Hilbert space, 
typically in a Sobolev space if*(T^), where the smoothness index s G M may also 
be negative. Now, if V takes values in a Hilbert space X we say that V has the 
covariance operator Cy : X — > X if 

(13) E{{V-EV,<P)x{V-E V, ^)x) = {Cv<P, ^)x, 

with any (j),ip E X. Here (■, ■)x stands for the inner product in X. 

Next we analyze a simple measurement model as an example. Consider the con- 
tinuum model M = AU + 8 where the convolution operator (fTTl) is now viewed as 
a smoothing map A:H'{T'^)^ C°°(T2). 
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We construct the smoothness prior by choosing [/ to be a generahzed Gaussian 
random variable taking values in /7~^(T^) and having expectation Ef/ = and 
covariance operator Cu = a~^{I — A)^^. Here a > is a regularization parameter. 
The operator Cu corresponds formally to the prior 

T^u{u) = cexp(--||M||^i(Tr2)) 

jormally Z 

and generates the discrete smoothness priors widely used in practice. However, it is 
curious that in spite of the term smoothness prior the realizations of U are almost 
surely not even L^(T^) functions, let alone different iable. This is why we need to 
consider U as taking values in some space if*(T^) with negative smoothness index 
s; the value s = — 1 is chosen just for convenience. 

White noise £^ is a generalized Gaussian random variable with expectation KS = 
0. Let us discuss the choice of an appropriate covariance operator: the standard 
definition of white noise as a generalized random variable is that 

(14) E{{£,4>){S,ij)) = {^,i:), forall0,V^GC°°(T2), 

where (■, ■) denotes the pairing between generalized functions and test functions. To 
consider the white noise £^ as a Hilbert-space- valued random variable, we can choose 
the Hilbert space to be any Sobolev space if'*(T^), s < —1. One possible choice 
is to consider S as taking values in if~^(T^) and choose the covariance operator 
Cs = {I - A)-2 : H-\T^) H-^{T^) as defined in (pS]). Note that realizations 
of S belong to L^(T^) only with probability zero, and this is why we need to use 
Sobolev spaces with negative smoothness indices s. 

Now the continuous framework for inversion is in place. Let m = M{uJ()) be 
a realization of the measurement M = AU + S. Since both the prior and noise 
statistics are Gaussian, the conditional mean estimate coincides with the location of 
the maximum of the posterior density. Thus we can evaluate the CM estimate u as 

(15) u = argmax {exp{-]-\\Au\\l2n2) + {m, Au) - {C^^u, u) h-^It^))} . 

We omitted in formula f|T5l) the constant term ||m||^2(']r2) in the formal expansion 

(16) ||m - Am||^2(t2) = ||Am||^2(t2) - 2(m, Au)l^(^t2) + ||m||^2(T2), 

which is well-defined only when m G L^(T^), and this happens with probability 
zero. Also, the smoothing properties of the operator A make it possible to replace 
the formal quantity {m, Au) l2(j2^ in f|T6l) by the rigorously defined pairing {m,Au) 
in f|T5|) . Note that we can write (ITSl) in the form 

X (y. 
u = argmin {-\\Au\\l2^j2) - {m,Au) + -||(/ - A)^^^u\\l2^j2)} . 

The practical measurement model is Mk = PkAU + PkS-, where Pk '■ -L^(T^) — > 
L^(T^) is an orthogonal projection with /c- dimensional range. We require that the 
sequence Pk converges strongly to the identity operator on L^(T^) as A; ^ oo. For 
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example, Pk may measure averages of AU over k pixels on and construct a 
piecewise function or compute a truncated Fourier series expansion to get an element 
of L^(T^). Here k can be arbitrarily large, enabling models of imaging devices with 
any resolution. 

Let us now turn to practical inversion where all appearing quantities are finite 
dimensional. We need to discretize the unknown. Take T„ : if~^(T^) H^^{T^) 
to be truncations of Fourier series expansions to n lowest frequency terms. Then T„ 
are linear orthogonal projections with n-dimensional range Yn converging strongly 
to the identity operator on if~^(T^) as n — > oo. Let U be as in the continuum case 
above and define a random variable f/„ := T„f/ taking values in Yn- The conditional 
mean estimate for f/„ (determined using the posterior distribution corresponding to 
model Mkn = PkAUn + Sk) is 

(17) Mkn ■= argmin {^\\PkATnu\\l2(j2) - {nik, PkATnu) 

Mey„nHi(T2) ^2 ^ ' 

where (TnCuTn)~^ is the inverse of TnCuTn : V„ Yn- Here, = Mkiujo) is the 
realization of measurement Mk in the practical measurement model. 

Theorem [5] below implies in particular that Ufc„ \i as, k,n oo, showing that 
Bayesian deblurring with Gaussian smoothness prior is discretization-invariant. 

Much of the material in this Subsection 12.11 is due to Lasanen [37j and Piiroinen 
[53] . We emphasize that we assume given a realization mk = Mk{uJo) from practical 
measurement model and that we do not have available any realization of Mkn- 
Because of this, rrik is used in formula ffTTl) . This is the main novelty of our Gaussian 
example compared to ^7\. 

We close this section by discussing a connection to generalized Tikhonov regular- 
ization, here defined as finding the element in -u G F„ fl if^(T^) that minimizes the 
functional ^ 

-||mfc — PkAu\\l2(j2'^ + — ||M||^2('ir2) + — II VM||^2(Tr2). 
After similar modification as above we can define the Tikhonov solution by 

(18) ul^ = aigmin lhPkAu\\l2(j2)-{mk,PkAu) 

The quadratic forms defined in f|T7|) and f|T8l) are the same, so u^^ = u^„. Thus our 
results apply to the convergence analysis of Tikhonov regularization as well. 

2.2. Besov prior. Gaussian smoothness priors are designed for representing the 
prior information that the unknown physical quantity U does not vary sharply. 
However, sometimes we know a priori that U is piecewise regular, and other kind of 
priors are needed. One could use Gaussian hyperpriors as in [32l [12] or geometric 
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priors as in [50]. We discuss a different approach that allows analysis of an infinite- 
dimensional hmit case and consequently leads to discretization-invariant inversion. 
We replace the discretization-dependent [39] total variation prior 

n(u) = cexp(— a|| Vm||li) 

formally 

by the discretization-invariant Besov space prior 

n(ti) ^ =„ cexp(-a||M||Ri (T2)). 

formally '-^ 

Since the norm of Bli{T'^) bounds the norms of (band-limited) first derivatives 
of u, we expect that the Bl^ prior can be used for edge-preserving inversion. It is 
computationally convenient that Bl^ functions can be written using a compactly 
supported wavelet bases and the Bl^ norm can be computed as a weighted sum of 
wavelet coefficients, see Appendix El 

The continuum measurement model is M = AU + S where the linear convolution 
operator ffTTl) is considered as a smoothing map A : i?J]^(T^) —>■ C°°(T^). 

We construct the i?;[^(T^) prior by expanding U in the wavelet basis as the sum 

oo 

1=1 

with each random coefficient distributed independently according to nx{x) = 
cexp(— |x|), where c = 2e~^ is the normalization constant. This distribution arises 
naturally due to the wavelet characterization of Besov spaces. The scale of the 
wavelet basis functions becomes finer when i increases; for the exact bookkeeping 
of scales and locations of ipi as function of i we refer to Appendix |X1 
We take T„ : _Bj^(T^) Bl-^^iT"^) to be the finite-dimensional projections 

(oo \ n 

i=i / i=i 

that simply truncate the wavelet expansion to n first terms. These operators T„ 
converge strongly to the identity as n — > oo. For each n, define a random variable 
Un := TnU taking values in F„ := spa.ia{ipi, . . . ,ipn) and consider the model Mkn = 
PkATnU + Sk where : L^(T^) L^(T^) is an orthogonal projection with k- 
dimensional range. 

With the above choices the posterior distribution of f/„ takes the computationally 
efficient form in terms of the wavelet coefficients X" := (Xi, X2, . . . , X„) of Un, 
namely, the probability distribution of X" conditioned with M^n is 

n 

^ ^ \Xi\ 



(20) 7rxn|A4„(a:i, . . . ,a;„,mfc) = Cexp - -||mfc - A ^ x^V^£||^2(Tr2) " « 

£=1 e=i 

The conditional mean estimate can be computed approximately e.g. using a Markov 
chain Monte Carlo algorithm such as the Gibbs sampler or the Metropolis-Hastings 
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method. Sampling from distributions is explained e.g. in [l3l[3T] and [33l Section 
3.3.2], and modifying those methods for computing the mean of ( 1201) involves only 
adding the fast wavelet transform appropriately. 

Our new results below show that Besov priors are discretization-invariant. First, 
by Theorems [7] and [19] and Corollary [20] the estimates Ukn converge when n,k oo 
either separately or simultaneously. Thus Besov priors do not involve the possible 
difficulties (a) and (b) mentioned in the introduction. Second, the prior distributions 
defined in Yn converge to a limit distribution in F as n — »• cxo (Proposition [TT]) . Thus 
we do not have the unwanted phenomenon (c) of the introduction. 

3. General theory of discretization invariance 

Consider two independent random variables U and S and the measurement model 
M = AU + £. We construct a rigorous stochastic framework for discretization- 
invariant Bayesian inversion using the following diagram of spaces and maps: 

Y ^ C S^I^ = Z c S 

(21) w u 

f/(^i) £M 

Our immediate task is to define all the objects in (121]) . 

Let (f2, S,P) a complete probability space with product structure: f2 = i7i x 
and S = El O E2 and P = Pi ® P2. 

Recall the notion of a random variable taking values in a Banach space. For any 
given separable Banach space X we denote the dual of X by X' . Let Bx be the Borel 
(T-algebra of (X, t^) with t"^ the weak topology of X. Note that the separability 
of X verifies that Bx coincides with the Borel a-algebra of (X, r"), where r" is the 
norm topology of X. An X-valued random variable V is simply any measurable 
map V : (fi, S) — > {X,Bx)- In this paper we consider only random variables with 
values in separable Banach spaces. 

We now assume that the space Y in the diagram ([^T]) is a separable Banach space 
and that U = U{uJi) is a F- valued random variable. 

The measurement noise S = £{uj2) with uj2 G ^2 is a Gaussian random variable 
taking values in a separable Hilbert space S] the expectation satisfies E£^ = 0, and 
the covariance operator : S ^ S, is defined by requiring 

E {{si,£)s{£, 82)3) = {si, CeS2)s for si, S2 G S. 

We assume that the essential range of £ is dense in S. Then is one-to-one, self- 
adjoint and in the trace class, and we may define the unique positive and self-adjoint 
power for any t G M. For t > we denote 5* := C\S] henceforth the spaces 5*^^^ 
and in (!2T]) are well defined. 
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The space Z = S^^'^ = C]- S is called the Cameron- Martin space of and 8 is 
called the white noise process in Z. We remark that the realizations of £ belong to 
Z only with probability zero. Note that C^- : 5* — > 5* is an isomorphism when the 
norm of the dense subspace S"* C is defined as HC^-mUs* := \\u\\s- The domain of 
definition of C^* is 5**, and the map C^* : 5* — > 5* is an isomorphism. Then 

In the Gaussian example in Section 12.11 we have = H~'^~^^^{T'^) so that 5* = 
5-0 = H~^{T^) and Z = L'^{T^) and we may choose Y = H'\T^). Considering 
S'*"'"2 c Z G 5*^*^2 as a Gel'fand triple, where S^~^'2 and 5*^*^2 are considered as 
dual spaces with the pairing {w, z) gt+i/2y,s-t+'L/'2 = {C^^w^C^gz) z, it then holds that 

E((£^,2;i)5"x5i(^, ^2)50x51) = {zi,Z2)z for 2:1,^2 e 5"^ 

Finally, we assume that A : Y ^ is a, bounded linear operator. The defi- 
nition of the objects in diagram (1211) is now complete, and we turn to discussing 
reconstructors. 

We analyze finite- and infinite-dimensional Bayesian inversion simultaneously, so 
let us introduce a rigorous setting for discrete approximations to the random variable 
U above. We say that F-valued random variables Un tend weakly in distribution 
(w.i.d.) to U if 

lim {Un,y') = {U,y') in distribution 

for every y' G Y'. Note that here "weakly" refers to the weak topology used in space 
Y, and "distribution" to the convergence of scalar-valued random variables. 

Definition 1. (Linear discretization of random functions) Let Y be a separa- 
ble Banach space and Yn G Y be finite- dimensional subspaces. The spaces Yn need 
not be nested. Let Un = Un{uj) he Yn-valued random variables with n — 1,2,3,.... 
Assume that 

(1) There is a Y -valued random variable U = U{uj) such that 

lim Un = U w.i.d. 

n— >oo 

(2) There are bounded linear operators Tn G L{Y) such that 

Un = TnU 

Then we say that the Un, n>l, are proper linear discretizations of U in Y. 

Examples of random variables that form or do not form proper linear discretiza- 
tions are given in Appendix [B1 

In following, we mainly consider cases where T„ are projection operators. 
Recall the following definition from [59l II. 7]. 
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Definition 2. Let U G L^{Q,'E;Y) be Y -valued random variable and Sq a sub a- 
algebra ofT,. Then the conditional expectation E(f/|So) of U exists with respect to 
Eq. That is, E(f/|So) G Sq;!^) and it satisfies 



Note that all vector-valued integrals in this work are standard Bochner integrals. 
We refer the reader to [TH] for definition and basic facts on Bochner integral and 
vector- valued conditional expectations. The operator P : U ^ E(f/|So) is a pro- 
jection P : S;y) Eg; 1^), where L^(r2, Eo;V) denotes the space of 
measurable functions V : {Q, Sq) — > {Y, r") which are Bochner integrable. 

Now we are ready to give the definition of a (non-unique) reconstructor. 

Definition 3. Denote by M. <zTi be the a-algebra generated by the random variable 
M{uj). We say that any deterministic function 



If Y is a separable Banach space and g : Y ^ Y is a measurable function, we define 
T^Ai{g{U) \ ■ ) : S Y to be any deterministic function satisfying 



Note that reconstructor is a deterministic function. Also, if a realization of the 
measurement in the computational model, Mfc„(a;o), is substituted in the reconstruc- 
tor 71m{U\ ■ ), then the obtained result 71m{U\ Mkn{u!o)) coincides with conditional 
expectation. Thus, reconstructor, considered as a deterministic function, is just the 
functional representation of conditional expectation (see e.g. (SH], section II. 7, for- 
mula (43)). However, we assume that we are not given a realization of the measure- 
ment in computational model, M^niuJo), as data, but instead Mfc(c(Jo) = PkM{ujQ), a 
realization of the measurement in the practical measurement model ([21). An essen- 
tial feature of the reconstructor is that it is defined for all elements in S. Thus, even 
though the reconstructor is related to the computational model (jl]), it is possible 
to substitute into the reconstructor the realization of the practical measurement 
model ([2]). This is the reason why the reconstructor, which has the same functional 
representation as conditional expectation, is defined as a new concept. 

One may generalize the well-known scalar-valued result on the existence of recon- 
structor, see II. 3 Theorem 3 and II. 7. 5], to Bochner- valued conditional expecta- 
tions. However, we will not need this since we next establish a specific formula for 
a reconstructor in our situation. Note that the following result is close to the usual 
functional representation of the conditional expectation, c.f. [59]. As we need to 
define reconstructors as a deterministic function of the space S and also introduce 
notations for later use, we present the proof of the result for completeness. 




for all D G Sq. 



TZm^UI ■ ):S ^Y, Hm{U\ m), 

is a reconstructor of U with measurement M if 

nM{U\M{uj)) = E {U\M){uj) almost surely. 



nM{g{U)\M{u)) = E {g{U)\M){uj) almost surely. 
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Theorem 4. Denote by fi : Bs [0, 1] the distribution of S in S, and set fia{E) = 
li[E — a) for a E S , E E Bs- Let g : Y Y be a measurable function satisfying 
¥.\\g{U)\\Y < oo. Set for me S 

g{u) exp(-^||An||| + {C^^Au, m)s) dX{u), 



(22) 



( 



-- [ exp{-l-\\Au\\l + {C^'^Au,m)s)dX{u) 
Jy 2 

where A stands for the distribution of U in Y . Then the function 

Hjj ^f(m) 

(23) nM{g{U)\m)= J^') : , meS. 



U,M 



m] 



is well-defined and satisfies 7lM{g{U)\M{u)) = K {g{U)\M.){u!) almost surely, that 
is, formula (E^j defines a reconstructor. 



Proof. Using the equahty M = AU + S, and Fubini theorem, we have for any 
measurable D = ExFcSxY 



P({(M,f/)GD}) 



US 



Xe i^') Xf ) ) dfiAUiu^i) (m') dFi {lui ) . 



Above, m' is an integration variable running over the space 5* where M is taking 
values. Thus we have for any integrable function f : S x Y ^ C 



(24) 



E(/(M,t/)) 



f{m',U{uJi))diJ,AU{L,i)im) 



dPi(^i). 



One checks that the same holds for any Bochner integrable function f : S xY Y 
by simply using the fact that such an / is an almost sure limit of simple functions 
fk that satisfy the pointwise inequality ||/A:||y < 

Since Z is the Cameron-Martin space of we have for any a G S*^ by [ini Cor. 
2.4.3] the Radon-Nikodynfl derivative 



(25) 



dfi 



[m] = exp{--\\a\\z + {C^ a,m)s). 



The latter formula has the advantage of being well-defined for every m e 5. In 
particular, we have 



(26) 



dfi 



m) 



exp 



-\\AU{u,)\\l + {C^'AUM,m)s 



^To motivate formula (j25p . we note that if Z would be finite-dimensional, we could write 
^(m) = exp(-i||TO - a||| + i||r7i|||) = exp(-i||a||| + {a,m)z). 
The formula (j25p is a generalization of this for the infinite-dimensional case. 
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Using formula ( l26|l we see that formula ( l22i) can be written as 



dfJ'AU{LUi) 



m) ciPi(tUi). 



By Fubini theorem we may continue from ix2M to obtain 



^{f{M,U)) = 
Especially for E C Bs it holds that 
E{xEiM)giU)) 



dfiAU{uJi) 

dfi 



dfi^m'). 



m) d¥i{uji) 



djjiirn'). 



Let v : Bs ^ [0, 1] be the measure v^E) = P(M ^{E)), that is, z/ is the distribution 
ofM. Now 



du{m')=E{xE{M)) 



djj, 



{m') d¥i{uji) 



dnim!) 



and thus u « /i and ^{rn) = Hljj^j{m) for almost every m with respect to /i, where 



Hh,Mim) 



dfiAU{uji) 

dfi 



m) dVi{uJi). 



Observe that HIj ^{m) > for all m G S* and thus z^(-E') = if and only if fi{E) = 0. 
Hence also ^ <^ u and 



dfi ^ 
dv 



ydjjL 



is well defined. Now by (l24]) 

Je Jn^ dfi dfi 

Hence by Fubini theorem ioi E & Bs 
EixE{M)giU)) 

g{U{u^))^^^^{m')dF,{u,] 



dfi 



,du 
' djjL 



m')) dv{m') 



Xe{M{uj)) 
Thus we have the almost sure equality 
E{g{U)\M){u)- 



giUM)'^^^^P^{M{uj))dV,M 



dfi 



giU{u,))'^^^^p^{M{u))dF,M 



dfi 



;^(MM))-Mp(c.). 

dfi 



(|(MM,)-'. 



This verifies the formula for TlM{g{U)\m) given in the assertion. 



□ 
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For convenience, let us look at Theorem H] in the finite-dimensional case. Then 
y = R" and Z = S = M"*, £ is the Gaussian white noise with the identity covariance 
matrix, and U and M have smooth everywhere positive probability density functions 
Tiuiu) and 7iM{m), correspondingly. It follows that 

(27) 7^M(f/H = (27r)-"/2 / uexp{-hAu-m\\l)^^^du 

satisfies the assumptions of Definition [3l Compare formulas ( l23l) and ( |27I) and 
see Appendix O for further discussion. Note that (|271) is widely used in practical 
Bayesian inversion [63l [33l [I3] . 

Stability of the reconstructor with respect to data m is important from the point 
of view of practical inversion. The following theorem yields a non-quantitative 
convergence result for reconstructors in general. We provide sharper results for 
the special case of Besov priors later in sections [5] and El 

Theorem 5. Assume that the exponential moments of U satisfy 

(28) E(exp(A||t/||y)) < oo for all A > 0. 
Take g : Y W to be such a continuous function that 

(29) |fi'('w)| ^ aexp(a||'u||y) for u eY 

with some constant a. Assume that Tn : Y Y , n > 0, and Pk '■ S-^ —>■ S-^ , k > 0, 
are linear projections satisfying 

(30) lim \\T„y - ylW = for all yeY, 

(31) lim \\PkZ — 2 II 51 =0 for all z G Ran{A), 

(32) ll^nllL(y) < Co, ||-Pfe||L(5i) < Co for all n,k 
with some Co > 0. Finally, let mk,m G S satisfy 

(33) lim rrik = m in S. 

Then we have the convergence 

lim TZe,„{g{Un)\mk) = nM{g{U)\m), 

where the reconstructors are defined using formula [2^) for models ^ and 
respectively. Moreover, the limits 

lim 7^e,„(5'(f^r^)|"^fc) and lim Tie ^^Jg{Un)\mk) 

n — >oo k — ^00 

exist for a fixed value of k (resp. n). 
Proof. We have 



,mfc)rfPi(c^i) 
djj 
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and since ( l28ll and ( l29ll imply E |5f(T„[/)| < oo we may also write 
Above 

^^^^^^(m,) = exp(-l||P,AT„f/(.;OIII + (C^r'^fc^W^i),^^)5) 
< exp(c||t/(c<Ji)||y) for all A;,n, 

by our assumptions. The claims follows now from the Lebesgue dominated conver- 
gence theorem by applying the majorant aexp((c + aCo)||f/(u;i)||y). □ 



Now the general theory of discretization-invariant Bayesian inversion is in place 
for the case of measurement models M = AU + S and Qkn = PkATrJJ + 8 concern- 
ing infinite-dimensional noise £. Using the continuum noise £ is convenient above 
because we can work in the same function space regardless of k. 

Assume given data Mk{uJo) corresponding to the practical measurement model 
(E]) and consider the computational solution of the inversion problem using the 
computational model (jll), where random error is finite-dimensional white noise £k = 
Pk£- It remains to discuss the implications of our general theory for these practical 
models. To do this, assume that are projections P^ '■ S ^ S having the following 
properties: 

(34) Ran (Pfc) is a finite-dimensional subset of S^, 

(35) (Pfc0,^)z = (0,Pfe^)z for0,V'eZ. 

First we show that reconstructors corresponding to the measurement models Qkn = 
PkAUn + £ and M^n = PkAUn + Pk£ actually coincide. 

Lemma 6. Assume Pk '■ S ^ S is a projection satisfying [34\ ) and [3^) . Then the 
reconstructors defined in Theorem^ for the measurement models Qkn = PkAUn + £ 
and Mkn = PkAUn + Pk£ satisfy 

ne,„{g{Un)\mk) = nM^Jg{Un)\mk) 

for nik e Ran{Pk). 

Proof. Consider first £k = Pk£ as a Gaussian random variable taking values 
in the space Ran (Pk) that has the inner product inherited from S. The random 
variable £k has zero expectation. 

Consider S and as dual Hilbert spaces. The corresponding pairing is 

which is an extension of the pairing 
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defined for rj E Z G S and G S"^. Moreover, for such t] and 

{PkVA)sxs^ = {PkVA)z = {v^Pk(t))z = {v,Pk(l))sxs^- 
As the finite-dimensional projection P^. : S ^ S is bounded, the density of Z in S* 
imphes that {PkV^ 4>) Sxs'^ = {v^ Pk(t>) sxs'^ fo^' all r/ G S* and G 5*^. Using this, we 
see for (j^yip & Ran (P^) 

E{{SkA)z{Sk,ip)z) = E{{PkS,^)sxs^ 

= {Pk£,ij)sxs-^) 

= E{{S,Pk<p)sxS^{S,Pk^)sxS^) 

= E{{S,C^'Pk<P)s{S,C^'Pkij)s) 
= (0,V^)z. 

This implies that the covariance operator of Sk, considered now as a Gaussian ran- 
dom variable taking values in Ran (P^) endowed with the inner product inherited 
from Z, is the identity operator. Using this we see that the reconstructor defined in 
Theorem H] for the measurement Mkn has the form 

n.,MUn)\mk) = E(exp(-ipf/.-.n.|||)) 

E{g{U)exp{-l\\AUk\\l + {AUk,mk)z)) 



E{exp{-^\\AUk\\l+ {AUk,mk)z)) 

for rrik G Ran (Pfc). The reconstructor TZQ^,^{g{Un)\'mk) defined in Theorem H] for the 
measurement Qkn has the same form, and the assertion follows. □ 

Finally, we prove the convergence of reconstructors for models with discrete noise. 

Theorem 7. Assume that in addition to conditions ^2E)-^3^, the projections Pk '■ 
S ^ S satisfy [34^ , (35\) . together with 

(36) lim \\PkZ — z\\s = for all z E S. 

k^oo 

Let u = U{ujq), e = S{ujq), ujq E Q be realizations of the random variables U and £, 
and let 

m = Au + e, rrik = AkU + P^e, 

be the realizations of the measurements ^ and respectively. Then the recon- 
structors defined in Theorem^ for the measurement models M^n = Pk^T^U + Pk£ 
and M = AU + £ satisfy 



lim nM^MUn)\mk) = nM{g{U)\m). 

n,fe— >oo 



Moreover, the limits 



lim nMkA9{Un)\mk) and lim nMk„{g{Un)\mk) 

n — 5-00 k — >OQ 
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exist for a fixed value of k (resp. n). 

Proof. By ( !36|) . linifc^oo f^k = m'mS, and hence the assertion follows by Theorem 
[5] and Lemma [61 □ 

Theorem [7| concerns the convergence of practical inversion methods: is data 
provided by an actual measurement device, and the computational model M^n = 
PkATnll + Sk allows computer implementation. For instance, most Markov chain 
Monte Carlo inversion algorithms are programmed to evaluate TZM^„{Un\Tnk). 

Let E G Y he a Borel set and Xe indicator function of E. Using recon- 

structors, we define 

V{E\m)='JZM{xAU)\m), 

Vkn{E\mk) = TZM^,XXE{Un)\mk). 

For a given m & S, the map E V{E\m) is a probability measure on Y by equation 
(l23l) . Next, let ii^ be a fixed Borel set of Y. If we substitute M{uj) in the function 
m I— >■ V{E\m), we obtain by Definition [3] 

V{E\M{uj)) = E {Xe{U)\M){uj) = F{{U G E}\M){uj) almost surely, 

where F{{U G E}\A4){u!) is the conditional probability for the event {U G -E} 
with respect to the a-algebra A4 (see [351 Sec. 6]). Thus, roughly speaking, m i-^ 
V{E\m) can be considered as the posterior probability for the event U & E when 
the measurement M gets value m, and Vkn{E\mk) the posterior probability for the 
event Un E E when the measurement Mkn gets value rrik- 

Recall that measures /ij on Y convergence weakly to measure /i as j ^ oo if 
limj^oo Jy 9 dfJ^j = Jy ddfi for all bounded continuous functions g : Y ^ M.. Thus 
Theorem [7] yields the following corollary. 

Corollary 8. Let the assumptions of Theorem hold. Then the Borel measures 
E ^— Vkn{E\mk) converge weakly to the measure E V{E\m) as n, k oo. 



4. Random variables in Besov spaces 
We wish to use priors of the form 

n(M) ^ =„ cexp(-||M||^, , 

formally -"ppV" ; 

for any integrability parameter 1 < p < oo and some chosen smoothness s G M. 
Note that we consider now functions on a general (i-dimensional torus T''. 

Following Appendix |X1 we use a compactly supported wavelet ip{x), x G M'^ 
and scaling function suitable for multi-resolution analysis of smoothness 
in L^(M'') with large enough r. Then we can expand functions as 



f{x) = J2ciMx), a;GT'^ 
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and / e B'pp{T^) if the norm [17] 

i/p 



(37) \\J2^iM^)\\Bs^m ■■= Y.^^^^"^'"' 



is finite. For the exact bookkeeping of scales and locations of as function of i we 
refer to Appendix 1X1 

Definition 9. Let 1 < p < oo and s G M. Let (X^)^-^ he independent identically 
distributed real-valued random variables with probability density function 

(38) TTx (x) = Cp exp(—|x|^), with Cp = (yj^exY>{—\x\^) dx) ^. 

Let U be the random function 

oo 

U{x) = J2 r(^/''+^/'-^/P)X,^,(a;), X G T'^. 

Then we say that U is distributed according to a B^^ prior. 

Next we show that random variable U in Definition [9] is a well defined object. 

Lemma 10. Let U be as in Definition [2I and take t G M. The following three 
conditions are equivalent: 

(i) ||f/||B*p < CX3 almost surely. 

1 



n 



Eexp{-\\U\\l, ) < oo. 



(iii) t<s-^. 

Proof. Denote by (X^)^]^ a sequence of independent random variables with 
density flHHj) . Assume (iii). In order to deduce (ii) we need to show the finiteness of 
the quantity 

E 



e^p(^llE^"^^'"^'^'"'^'^^^^^llB^J ^ Eexp(5^i£(*-^)^/'^|X,r 
i=\ 1=1 

oo ^ 

= JJe exp(-r('^-*)p/'^|x^|p) 



'2 
e=i 



(39) = JJ(i - r(*-*>/V2)-i/p. 



i=i 



Above we used independence and the observation that for k G (0, 1) one may 
compute E exp (fclX^I^) = (1 — k)~^/^. Clearly the product in ( !39|) converges if 
t < s — d/p. The notation a c:^ b stands for the existence of a positive constant 
c < oo such that a/c < b < ca. 
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Observe next that obviously (ii) implies (i). Finally, assume that (i) is true. Then 

oo 
1=1 

almost surely. Since the the random variables \X(\^ are non-negative and identically 
distributed, an easy application of truncation and [351 Thm. 4.17] shows that almost 
sure finiteness of the sum implies finiteness of the expectation. Hence (i) implies 
(iii). □ 

Now we easily see that B^^iT'^) distributions generate discretization-invariant pri- 
ors. 

Proposition 11. Let U be distributed according to B^^ prior as in Definition\^ and 
let t < s — - . Take Tn : Y ^ Y to be the sequence of linear operators defined by 



then lim^^oo = y for all y E Y := i?*p(T°'). Then Un = TnU, n = 1, 2, . . . are 
proper linear discretizations of U in the sense of Definition^ 

Proof. The random variables Un = TnU converge almost surely in norm topology 
of Bpp{T'^). Since almost sure convergence implies convergence weakly in distribu- 
tion, the assertion follows. □ 

Note that if above t > ^, then the continuous embedding i?*p(T'^) C{T'^) 
implies that realizations of If and f/„ are almost surely continuous. 

5. Quantitative estimates for reconstructors 

This section studies the case that U is distributed according to S^p-prior. We 
provide quantitative stability estimates for reconstructors. For p > 1 qualitative 
results are described by Theorem [5], thanks to Lemma fTOT ii). However, the case 
p = 1 is more difficult since condition (l28l) fails. 

We collect a set of assumptions together for later reference: 

Assumption A. Let s G M and 1 < p < oo be arbitrary. Fix t, t, r G M such that 

t < t < s — d/p and r > d/2. 
Let A be a hnear operator satisfying 
(40) A: 5*p(T'^) ^S[i(T'^). 

Assume that g : B^^ 5*^ is a map such that for some q < oo we have 
\\giu)\\Bt^ < c{l + \\u\\BtJ'' and 

\\g{ui) - g{u2)\\Bi^ < c\\ui - 'U2||b^^(1 + max(||ui||Bt^, IMb^JY- 



22 



Finally, let G : B^^ B^^ be a bounded operator. 



How does the diagram (1211) look for the i?pp-prior? Take t < s — - and set 



Observe that by Lemma [lO] the random variable U takes values in the Besov space 
Y = i?pp(T'^). Above S is standard Gaussian white noise: KS = and 

for all 0, V' ^ C°°(T'^), where (■ , ■ ) is the distribution duality. To make our results 
more precise, we will consider the white noise as a random variable taking values in 
a Besov space instead of Sobolev spaces, and consider S here as a random variable 
taking values in the Besov space B^^iT'^). 

In assumption A we are particularly interested in g being a characteristic function 
= Xe{u) of some set E (corresponding to confidence intervals), the identity 
map g{u) = u (corresponding to the mean), and g{u) = H^H^t • We denote by 
H3{m, A,G) the quantity 



H'^{m,A,G) :=E (^g (GU) exp {- ^\\AU \\l2 + {AU,m))^ . 

In the case g = I the operator G plays no role, and the corresponding real-valued 
quantity is denoted simply by H^{m, A). In general, the above integral is defined as 
a i3*p-valued Bochner integral. The weak measurability of the integrand is obvious, 
whence the strong measurability follows by the separability of the spaces involved. 
The following auxiliary result will be used to verify integrability and to estimate 
the sensitivity of the quantity H^{m, A, G) with respect to changes in the variables. 
Below, for 1 < p < oo we denote p' = p/{p — 1). 

Proposition 12. Let U he distributed according to B^^ prior as in Definition\^ Let 
g > and m G B^^. Denote by Sg{m, A) the random variable 

S,{m,A) := (1 + im^^Jexp ( - ^WAUWl. + {AU,m)). 
Let w > r be an arbitrary index, which in case p = 1 satisfies w > r. Assume that 



A : 5*p(T"') ^ S]"i(T'^) zs bounded. Then there is a constant c = c{p, r, t, w, d, q) 
such that 

(41) ¥.Sg{m,A) <cexp{c{\\A\\Bi^^B^^r'^- ' 'UI"^iib, 



2r/(iy-r+2r/p') Qi^ll _^ y.w / {w-r+2r /p') 

with the understanding that in the case p = 1 one sets 2r /p' = 0. 
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Proof. In order to prove (HTl) we first observe that by Lemma fTOT ii) and Cauchy- 
Schwarz inequality it is enough to estimate the expectation 

(42) J':=Eexp(- \\ AU \\l2 + 2{AU,m)) . 

By Lemma fTOl we have that E exp(|||?7||^t ) < oo. The decomposition 

r = Eexp(i||f/||^, +(_l||f/||P^ -\\AU\\l, + 2{AU,m))) 
^ Z pp z pp 

shows that 

(43) I' <c exp ( sup S{U) 
where 



u 



SiU) :=-l||f/||^, - \\AU\\l2 + 2{AU,m). 
Z pp 

For i > denote by Ri the standard projection to the first i coordinates in the 
wavelet basis, i > 0, with Rq = and Qe = I — Ri. Fix an integer io > and divide 
further above m = Ri^m + QiqIti. 

We now assume that p > 1. By completing the square, and applying Young's 
inequality in the form — + 2xy < Cpt/^ we obtain 
1 

z pp 



S{U) = -^\\U\\l,^^ + 2{AU,Qe,m) -\\AU\\l,+2{AU,Re,m) 
1 



< ~\\Urn, +2\\U\\BjA*QeMB-;,-UU-R,,m\\h + \\ReMh 

(44) < Cp\\A*Qe,m\\P^_, + \\R^,m\\l2. 

p' p' 

If m = there is nothing to prove, so assume that m 7^ 0. Recall that w > r and 
consider first the situation where 

(45) \\A*t-^ > (||m||^-. 
where 1/p + 1/p' = 1. In this case we choose 

/ N d/{2r+p'{w-r)) 

io := [I\\aX.^^^^, \\m\\^-J ]+l 



d/(2r+p'{w-r)) 

, ,„..„ -2 \ 

,D — t M II D- 
J^ocoo " ^ ^1 ^,1 -"-^o 



\A. ||'^_,,, IItti"^ 



--'OOCX3 



We may estimate 

(46) < ^^'^^^Ia^IL-^ \\m\ 



9(r-u))/(i 

II ^ II >L 

Moreover, one easily verifies that 
(47) \\Reom\\h < {2iof'/''\\m\\' 



0000 
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By invoking the choice of £o and applying the auxihary estimates ( H6|l and ( H71) we 
compute from (jHj) that 

S{U) < cillAU^^^B^T'^^""-'^^^^^^^^ 

which finishes the proof in the situation Above, we also used the observation 
that R-t = ||yl||nt . On the other hand, if (113) is not valid, that is, 

II ll-Doooo^-Dp/p/ II ll-°PP^-°ll ' ^ V 5 5 

(48) < (||m||^- 



we apply the choice £o = in (jHj) to estimate 

(49) S{U) < cJA*mr^_, 

< r\\A*\\P' \\m\\P' 

In case w > r the last inequality above followed from inequality (l48i) . 
Consider next the case p = 1. Assuming first that 

(50) ||A*|L-,. ||mf' > 1/4 
we utilize the fact that w > r and choose 

4 := r(4||m|L-,. p*|L-» ^o-O'^/^""''^] + 1 
~ (4||m||„-. p*|L-»_„-t )^/('"-'^) 
and observe that then fl46l) verifies that || A*(5^Qm||^-t < 1/4, which in turn yields 

S{U) = -^\\U\\Bi^+2{AU,Qe,m) -\\AU\\l,+2{AU,Re,m)L^ 

(51) < -\\AU - Re,m\\l2 + \\ReM\h 

(52) < \\RiM\h 

< c(4Pb.^^^»)^^'-/(»-)(||^||^_,^)2W(--.)^ 

where we also applied the estimate ( |471) . Finally, in case ( l50|) does not hold we may 
choose i'o = above and obtain that 

s{u) < cmis^^^^s^f'/^-^'-''"^'^^ 

for all U. Summarizing, we have shown that in all the above cases 

(53) supS{U) < c^{\\AU^^^BrT^^'"-'^''^'\\M^^^^^^ 

with some Ci, C2 > 0, which finishes the proof of the Proposition. □ 

Our first application of the above result will be a local Lipschitz continuity esti- 
mate for the quantity H^{m, A, G). 
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Proposition 13. Denote 

i^' := max(||m|| R-r ,||m'||R-r ), 

^ II II i>oooo ' II II -Doooo ' ' 

a := max(p||ijt^^ij» , \\A'\\Bt^^B^^] 



• 'j^pp "^-^pp ^pp '^^pp 



Then it holds that 

(54) \\H^{m, A, G) - H^im', A', 

< + WA-A'Wb^^^^bi, + \\G-G'\\^^^^^,Jh{K,a,v), 

where 

h{K, a, v) := c{K + a+{l + v^) exp (^a(2'-/('"-'-+2'-/^''))ir(2«./(--r+2r/p')) j _ 
Here w and other indices are as in Proposition [IB 

Proof. Choose arbitrary h' G B.t^, with = 1. It is enough to estimate 

y f p/pf 

the difference 

E|/(2,f/)-/(l,f/)|, 
where for x G [1,2] and U G B^^ we have 

(55) f{x,U) = 

(^h', g{{G +{x- 1){G' - G))U) exp ( - 1||(A + (x - 1){A' - A))U\\l. + 

+ {{A + {x-l){A' - A))U,m + {x-l){m' -m))^^ 

It will be convenient to introduce for each x G [1,2] the notation Gx = G + {x — 
1)(G" - G), Ax = A + {x - 1){A' - A), and = m + (x - l)(m' - m). As g was 
assumed to be Lipschitz it follows that for each fixed U the function x ^ f{x, U) is 
also Lipschitz. Hence we may compute for almost every x G [1,2] 

DJ{x,U) = (^^{h',g{GxU)) + {h',g{GxU)){-{{A'-A)U,AxU)L^ + 



+ {AM, m-m) + {{A' - A)U, m,))) 
■exp(^-^\\AxU\\l2 + {AxU,mx)^ 



In order to estimate the right hand side we observe first that our assumption on g 
yields for almost every x G [1,2] 

\-^{{h',g{GxU)))\ < \\{G'-G)U\\s;^{l+v\\U\\^,/ 

< ||f/||^r (i + ^||f/|| )<? 

pp pp ^pp ^pp 

< (i+^)5(i + ||f/|| )5+i. 

^pp ^pp ^pp 
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Moreover, by using the observation that i?[]^(T'^) C L'^{T'^) it follows that 
I - {{A' - A)U, A,U) + {A,U, m'-m) + {{A' - A)U, m,) \ 

< \\A4b^^^^l4A' - MB^^^^L-AlUWlt^^ + \\m - m\\s-rJ\A,\\Bt^^^BrjU\\Bt^ 

+ \MB-^J^'-MBl^-.BrJU\\B^^^ 

< c{K + a){\\m' - m||^-,^ + ^^.^^^^.^(l + \\U\WJ\ 

By combining the previous bounds and the obvious bound for g{GxU) we obtain 
(56) \DJ{x,U)\ < c[K + a + {l + vr)[\\m' -m\\^-r^ + 

+ \\A'- AU^^^^^Bi, + \\G'- G\\^T^_^^.J ■ Ax). 
The Fubini theorem allows us to compute 

(e| — (/(x,f/))|)t/x 

< c{K + a + {l + vm\m' - m||^-^,^ + \\A - ^^^.^ + - G\\^,^^^.J 

^^xi -^x) dx. 

According to Proposition [12] there is the uniform bound 

(58) E Sg+2imx, Ax) < Cexp (^a^^r/{w-r+2r/p')) j^{2v,/(w~r+2r/p'))^ ^ 

and the fl54p follows immediately by combining this with (1571) . □ 

The local Lipschitz constant of a given map r : E ^ F between the Banach spaces 
E and F is defined as 

J- f \f \ r ||r(i/) -r{x)\\F 

Lip(rj(xj := hmsup for x G E. 

y^x \\y-x\\E 

In terms of this quantity the previous proposition states that 
Up{H3){m,A,G) < 



c(l + ||G||^|^_^.^)'^exp (c(||Ab.^_^» ; 



2r/{w-r-\-2r/p') ( \\^\\ ^^2w/{w-r+2r/p') 



where the linear factors containing norms of A and m were absorbed in the expo- 
nential term. Above we consider as the map 



t 

pp- 



Before we are able to give good estimates for the Lipschitz constant of the ratio 
H^/H^ we need a couple of auxiliary results. 

Lemma 14. Let a > and assume that f and F are non-negative random variables 
with EF < cxD. Then there is a constant c depending only on a so that 

(59) < c (c + log(E exp(/72)) + log(EF2) - 21og(EF))'/'^ . 
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Proof. Let us consider the change of probabihty measure ¥p that is simply 
obtained by using F as a weight and normahzing. Then the left hand side of ( |59l) 
equals E pf- By invoking the convex function (paix) := exp((l + |x|)°/a) and applying 
the Jensen inequality we obtain 

M^Fj) < ^fMj) < ^ > 

where the last inequality followed from the Cauchy-Schwartz inequality. By using the 
inequality + < c + c|x|" and applying the inverse (j)~^{x) = (a log(x))^/" — 1 
a direct computation yields the inequality 

(60) E (/F)/E F <c{c + log(E exp(c/'')) + log(E F^) - 2 log(E F)) . 

Then fl^Ul) follows by applying flUUl) on (2c) ^^/"/ in place of /. 

Lemma 15. For any t < s — n/p there is a constant C = C{t) > such that 

H\m,A)>Cit)expi-^\\A\\%^^^^,), 

independently of m. 

Proof. Since the prior measure is invariant under the change of variables U ^ —U 
we obtain that 

EH\m,A) 

= ^E (exp ( - ^\\AU\\l + {AU,m)) + exp ( - ^\\AU\\l, - {AU,m)) 

> Eexp(-ipC/||i.)- 
We may clearly take C{t) := P({||[/||Bt^ < 1}). □ 

Lemma 16. Let A > 0. Under assumption A and with w as in Proposition HE there 
is the estimate 



ESx{m,A) 



< c(l + llmlL-. )^i(l + Pb* -B™)^' 

— V 'II ii-Df^m'' V II ii^vp ' 



where Bi = — ^^"1 , and 02 = max( — , — ,,., ). 

' ^ p(w—r+2r/p') ' ^ ^ p ' p(w—r+2r/p') ' 

Proof. We apply Lemma [T^ with the choice a = p/\, f = (|(1 + Ht/H^t ))^ and 

F = exp ( — |||A?7||^2 + {AU, m)). One inserts the estimate for the quantity EF^ = 
/', see (H2l) . obtained in (H3l) and (l53ll in the proof of Proposition [T2l Moreover, a 
lower bound for EF is given by Lemma fTSl and we recall that E exp(/'^/2)) < oo 
thanks to Lemma fTOl □ 

We are ready for the main result of this section. 
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Theorem 17. Consider the ratio {m, A, G) / H^{m, A, G) as a map 

Under assumption A, and with w as in Proposition [TR the local Lipschitz constant 
of this function satisfies 



(61) 

T.inf 

H 



Lip(|^)(m, A G) < c(l + ||G||^,^^^.J^(1 + ||m||^-.^)-(l + 11^^.^^^.; 



where a = l + (^) max(l, ( f^)) and ^ = 1 + (^)( ^^). 

Proof. Denote K = ll'^lls-'j^, « = ||^||b*p^_b™ and v = ||G||^t • Proposition 
[13] verifies that both and are locaUy Lipschitz. As a special case we obtain 
that E,Sq+2{fn, A) is continuous with respect to variables A and m. Hence inequality 
(1571) yields the estimate 

Up{HS){m,A,G) < c{K + a + {1 + vy)E Sg+2{m, A). 

The simple inequality (here Xi,X2 are vectors and yi,?/2 > are scalars) 



^2|| , 1 II II , \y2-yi\ II 

I II < — 1^2 - Xi\\ H 

1/1 1/2 Z/2 |/l?/2 



verifies that point-wise 



^ Lip(g^) , imis^UpiH'] 



.^1/- ^1 (^1)2 

Observe that by assumption A we have || if^H^i^ < cE Sg. By combining the previous 
estimates and remembering that = E 5*0 we thus obtain 

L.p(§;,..(.-...a..)')(^)H-<||)e(..,(^). 

The statement then follows by three applications of Lemma [161 n 



Remark 18. The main content of Theorem [T7] is a stability estimate that grows 
polynomially in the norm of the measurement m. We have not striven for the optimal 
exponents in the above computations. Moreover, one should observe that in the 
case p > 1 it is possible to choose w = r in Theorem [TTl which corresponds to 
the weakest condition on smoothness of A. If p G (1, 2) this yields the exponents 
a = 7 = 1 + p'{q + 4)/p. In the important special case of p = 1 one is forced to 
choose w > r. On the other hand, the choices w > r yield considerably smaller 
exponents for all small values p > 1: in the limit w ^ oo one has a ^ 1 + (^2^), 

and7-l + (^). 
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6. Convergence results for Besov priors 
In the present section we assume that indices t, t, r and the quantities m, A, and 



g satisfy Assumption A. We take T„ : 5* (T"^) 5* (T"^), n > 1, defined by the 



famihar truncation 

(oo \ n 

1=1 J i=i 

discussed above and in Appendix |Al Then T„ — > / strongly in L{Bpp) as n — >• oo. 
We consider proper hnear discretizations Un = TnU of the random variable U . 

By standard compact imbedding results it follows that (/ — T„) — > as n ^ oo 
in the operator norm topology of 1?*^), and (/ — Pk) ^ as A; — » oo in 

the operator norm topology of L(i?[]^, with r > r, see Appendix Rl Next we 
formulate the assumptions in quantitative terms. 

Assumption B: Let r > r and assume that A : Bpp{T'^) —>■ B[i{T'^). Define Tn by 
); then we have 



(63) \\I -TnW^j^t Bt ) <V2in) forn>l 

with lim„^oo '72(^) = 0. Further, let Pk : -Bpp(T'^) B'^piT'^) he hounded linear 
projections with k-dimensional range satisfying 

(64) II / - PkhiBi^Bi,) < Vi{k), fork>l 

with \imk~^ooVi{^) = 0- Assume that \\Tn\\^B^^ ) — ^ ^^'^ II -Pfe II L{_Bj;p) ^ C for all 
k,n > 1. The measurements are assumed to he uniformly hounded hy a constant K: 

(65) II'^IIr-'- and ||wife||r>-r <K for all k > 1. 
Finally, it is assumed that 

(66) VsW •= ll^fc ~ '"^IIb"'' satisfies lim rj^lk) =0. 

Let Qkn = PkAUn + £■ As before we define the reconstructor lZQf,^{g{Un)\mk) 
corresponding to model Qkn at ruk as 

(67) 7^e, {g{Un)\mk) = ^^-'Q'^" j'^'j . 

The following result yields a quantitative convergence result for Besov priors: 

Theorem 19. Let Assumptions A and B hold. Denote 7 = 1 + (^y^)( ^_^^2r/p' )- 
There is a constant C = C'{A, g,t,r,w,p) such that 

||7^e,„(t/„|mfc) - 7^M(f/|m)||B. < C'{1 + K^Mk) + r^^^n) + r,^{k)]. 
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Moreover, the limits 

lim ne^^„{g{Un)\mk) and lim 7^efc„(5'(^n)|"^fc) 

n — >oo k — ^oo 

exist for a fixed value of k (resp. n). 

Proof. The statement is an immediate consequence of the estimates obtained in 
the previous section: observe that in our notation 

HLeJ^k) _ H9{mk,PkATn,Tn) 
Hlr„,e,S^k) H^{mk,PkATn,Tn)' 

Moreover, apart from a possible change of the uninteresting constants, the Lipschitz 
bound given by Theorem [T7| on the quantity H^/H^ is uniform on any bounded 
neighbourhood of m, A and G, where G is the identical embedding Id : B^^ — > B^^. 
Note also that by our assumptions {A — PkATn) — > in the operator norm topology 
of L{Bpp,B'[^). The first claim now follows from definitions and Theorem [T71 The 
last statements follow immediately by the same reasoning. □ 

We emphasize that Theorem [19] covers the highly interesting value p = 1 despite 
the failure of assumption (l28ll of the general theory in that case. 

Let us revisit the deblurring example of Section 12.21 where p=l,s = l,(i = 2 
and A has a smooth kernel. Often it is possible to take the projection Pk related 
to the measurement device to be truncation of the wavelet expansion to the first k 
terms analogously to ( l62l) . (For instance, Pk might measure local averages at a grid 
of points and then compute discrete wavelet transform by convolutions with finite 
filters.) 

Then Theorem [19] yields the convergence of reconstructors and a result analogous 
to Theorem [7] with an explicit convergence speed. 

Corollary 20. Let Assumptions A and B hold. Let p = s = 1 and assume that 
A : D'(T^) — > C°°(T^) is a hounded linear operator. Let Pk and Tn he truncations 
to k and n first terms in wavelet expansion, respectively. Moreover, let t < t < —1, 
r > ri > 1, X > 11, and r > 0. Then 

(i) There is a constant Cq = Co(A, r, t. A) > such that the reconstructors satisfy 
||7^e,„(f/„|mfc) - 7^A/(t/|m)||B*^(^^2) < Co(l + KfMk) + r,2{n) + r^^{k)]. 

(a) Let u = U{uJo), e = S{ujq), E Q he realizations of the random variahles U and 
S, and 

m = Au + e, mfc = A^u + P^^? 

he the realizations of the measurements ^ and respectively. Moreover, assume 
that m G B^{'^{T'^). Then there is G > independent of n and k so that the 
reconstructors defined in Theorem^ for measurements ^ and ^ satisfy 

(68) ||7^M,„(^7„|m,)-7^M(f/|m)||Bt^(^^2) < G[k~^ + n'^^-'^l^ 
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Note that in (ii) we have m G B-^^^^lT"^) for P-a.e. u. 

Proof. Claim (i) follows directly from Theorem (TH] when we use q = 1 and so 
large w that A > 7. As A is an infinitely smoothing operator, we can choose above 
any —00 < t < t < —1 and f > r + 2r > ri + 4r. Moreover, since m G i?pJ^(T^), 

we can take above 'qii^k) = Cik~^'^~^'>^'^, r]2{n) = C2ri~*-*~*^''^, and 'qs^k) = c^k''^^'^^^^'^ 
where Ci, C2 and C3 depend on ujq and the parameters r, r, ri, t, t, but not on or n. 
As the projections Pk satisfy the conditions flMl) and flH^ . the assertion (ii) follows 
from (i) and Lemma El n 



Appendix A. Besov spaces and wavelets 

Let ip and be compactly supported wavelet and scaling function suitable for 
multi-resolution analysis of smoothness in L^(M). 

Following Daubechies [TBI section 9.3] we construct a wavelet representation for 
periodic functions in M with period 1; in other words, for functions on the one- 
dimensional torus T^. Set 

<PjAx) = 5^0(2^'(x + £)-A;), 

We use in the following the subspaces of L^(T^), 

Vj := span{0j^fc | k G Z}, Wj := spa.n{ipj^k \ k G Z}. 

It turns out that Vj are spaces of constant functions for j < 0. Thus we have a 
ladder Vq C Vi C V2 C ■ ■ ■ of multiresolution spaces satisfying 

Further, we denote the successive orthogonal complements of Vj in Vj^i by Wj for 
j > 0. Then we have orthonormal bases 

{0,,fc|fc = O,...,2^-l}in\/„ 
{ijj,k\k = 0,...,2^ -1} in Wj. 

Following Meyer [171 section 3.9] we define a wavelet basis for periodic functions 
in W^; in other words, for functions on the torus T*^. Let E denote the set of 2^^ — 1 
sequences u = (z^i, 1/2, ... , i^d) of zeroes and ones, excluding the sequence (0, 0, . . . , 0). 
Define for u & E and j > the wavelets 

:= 2'^-^-/2^.i(2^xi - k,) . . . r'i^^x^ - k,) 
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with the convention that = (p and ip^ = i/), and integer-valued components of 
vector k ranging over 

0<ki<2^~l for all i = l,2,...,d. 

The functions constitute an orthonormal basis for L'^{T'^). Let us renumber 

the above basis functions using just one integer £ = 1, 2, ... . First, i = 1 corresponds 
to the scaling function 0(xi) . . . The remaining numbering is done scale by 

scale; that is, we first number wavelets with j = 0, then wavelets with j = 1, and so 
on. The 2"^ — 1 indices v & E are naturally numbered by thinking them as binary 
representation of integers. The exact ordering of all 2^'^ translations corresponding 
to a fixed j can be chosen arbitrarily. This leads to a numbering of the following 
type: 



scale j = 
scale j = 1 
scale i = 2 



2 2'^ 

2^^ + 1, . . . , 2^^^, 

2^^^ + 1, . . . , 2^"^, 



According to Meyer [17| Section 6.10], we can characterize periodic Besov space 
functions using these wavelets. Namely, the series 

oo 

fix) = ^ce^eix) 
belongs to Bp^iT'^) if and only if 



We always assume that r is large enough for providing bases for Besov spaces with 
smoothness s. The case q = p is especially relevant to us, and we obtain the 
equivalent norm 



(69) \\Y.^iMx)H^iT-^) ■= 



1/p 



We use the above quantity as the definition of the Besov norm || ■ ||Bs^(Td) for gener- 
alized functions on T"', s G M andp G [l,cx)]. In case p = oo the definition must be 
understood as follows: 



£>1 
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It follows that Bpp(T'^) is isometrically isomorphic to the sequence space i^, and 
by the simplicity of the norm it is easy to control the basic properties of the spaces. 
Especially, there is an embedding (an easy corollary of the Holder inequality), 

(70) B;1,^^{T'') C B^l^^iJ") if and only if s, - d/p^ > - d/p^, 

and it is easy to verify that this embedding is compact if Si — d/pi > S2 — d/p2- 
The dual of B^piT'^) is B~y(T^) if p G (l,cx3), and p' stands for the dual index: 
1/p + 1/p' = 1. The duality is with respect to the standard duality bracket: if 
/ = Yl'eLi fii'i ^iid g = Yl'eLi Qei'e finite sums, then 

« 00 
(/, 9) = f{x)g{x) rfx = V figi. 

We finally observe that natural bounded linear projection operators on B^^lT'^) are 
obtained by setting 

n 

Tnf{x) = ^C£V'£(x). 

e=i 

These projections work at the same time for all the spaces and are contractions. 
Moreover, if p < cx), then lim^^oo ||/ - 7'n/||s|^(T'i) = for all / G ^^^(T'^). 

Appendix B. Examples of limits of finite-dimensional random 

variables 

Here we illustrate difficulties related to finite-dimensional models and their pos- 
sible convergence to an infinite-dimensional continuum model. Unless otherwise 
stated, we work on a circle T^, or equivalently, the interval [0,1] with end points 
identified. Let u G L^(T^). We consider two measurements 

A«M= [ u{t)dt, A^^\ = u{l) 

and the corresponding measurement models 

(71) M^^) = A^^Wn + ^, M(^) = A^^^U + 

(72) ) = A^^^Un + 8, M(2) = A(2)f/ + 8 

where Un is a finite-dimensional random variable, f/ is a random variable in an 
infinite-dimensional function space, and £^ is a normalized Gaussian random variable 
independent of U and f/„. We consider various examples of [/„ and study whether 
some random variable U could be considered as a limit of f/„ as n — > 00, and whether 
models ( I7T1) or ( I72l) make sense in the limit. 

In all examples below, X", j, G N are independent real- valued normalized Gauss- 
ian random variables. 
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Example 1. ("Non-proper discretization of white noise") Let I{n,j) = {^-f^, 
j = 1, . . . ,n and Xjit) = Xi{n,j)if) be the indicator functions of intervals I{n,j). Let 

n 

(73) U^{t) = Y,a]X^x]{t), teT' 

i=i 

where a" > are parameters. 

For a fixed n, with an ad hoc choice a" = 1, the functions Un could be considered 
as an interesting random signal. However, for any function G L^(T^) 

lim / Un{t)(f){t) dt = in distribution, 

and thus Un, considered as L^(T^) valued random variables, converge to zero weakly 
in distribution as n oo. Taking U = we see that the measurements Mn^ 
converge in distribution to M^^\ Concerning measurement ( 1721) we notice the 
Unil) ~ ^(0.1)' but [/(|) ~ N{0,0). Thus Mi^^ ~ N{0,2) for all n, but for 
?7 = we have M^"^^ ~ ^(0, 1). This shows that measurement models (172|l do not 
behave nicely with the choice a" = L In this example Un are not proper linear 
discretizations of U in any Banach space, since the second condition in Definition [1] 
is violated. 

Example 2. ("Proper discretization of white noise") Consider random variables 
Un defined in ( 1731) with parameters a" = n^/^. This choice is motivated by the fact 
that the functions n^^'^Xiinj) are orthonormal in L^(T^). Then, for G C°°(T^) 

(74) lim / Un{t)(f){t)dt= {U,(f)) in distribution, 

where U is Gaussian white noise in L^(T^). This actually holds also when G 
H^{T^). Let Qn be the L^(T^)-orthogonal projection on the subspace of functions 
that have constant value on the intervals I{n,j). Then (Un, <p) appearing on the left 
hand side of (17^ is a real- valued Gaussian random variable with covariance 

n 

Y.mn''\Hn,))? = \\Qnm 

As n — > oo this tends to the value ||0||2, as is easily seen by first approximating 
by smooth functions. Hence [/„—>■[/ weakly in distribution in the space H~^{T^). 

Moreover, we note that f/„ has the same distribution as the random variable T„f/, 
where T„ : H^^(T^) — > H^^(T^) is the linear operator T„f = X]j=i('^) 'P'^) ^^^"^Xiinj), 
where (0")"^^ is any orthonormal sequence in L^(T^) consisting of elements of 
H^{T^). We have thus verified that the variables Un are proper linear discretizations 
of U in the space H~^{T^), according to Definition [H 

Now the measurement M^^^^ is well defined and Mn^ converge in distribution to 
as n — > oo. However, we have Mn'^ ~ iV(0,n) and thus measurements M^^-* 
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do not converge in distribution as n oo. This is related to the fact that the 
white noise [/ is a well defined H~^{T^) valued Gaussian random variable, and the 
constant function 1 is in the dual of the space H~^{T^) but the point evaluation 
u I— *• does not define a bounded linear operator in H~^{T^). 

Remark 21. In Example 2 above one may verify that it is possible to choose the 
L^(T-'^)-orthonormal sequence (0")"^;^ so that the norm \\Tn\\ h-^{t'^)^h-'^ (T'^-) remains 
uniformly bounded for all n. However, this choice is somewhat complicated. A 
way to construct discretizations with this property (and such that they fall in the 
scope of the basic theory developed in Section [3]) is to apply suitable finite dimen- 
sional approximations of identity that are uniformly bounded simultaneously on 
both if~^(T^) and L^(T^). E.g., one may truncate Fourier series or apply basis pro- 
jections corresponding to a wavelet basis (set p = 2 in Section [6]). Details of these 
comments will be considered elsewhere. Similar remarks apply to Example 3 below. 

Example 3. ("Discretization of the Gaussian smoothness prior") Choose contin- 
uous functions r/" : — > M so that they are affine on intervals I{n,j), j = 1, . . . ,n 
and that (77")"^]^ are orthonormal in H^{T^). Let 

n 

(75) U^{t) = J2b]X-r^J{t), 

where 6" > are parameters. We choose 6" = 1. Then, for </> G L^(T^) 

(76) lim / Un{t)(f){t) dt = / U{t)(j){t)dt in distribution, 

where [/ is a Gaussian random variable in L^(T^) having zero expectation and 
covariance operator (/— A)~^, that is, U is the one-dimensional Gaussian smoothness 
prior. Let Q„ be the if^(T^)-orthogonal projection onto the subspace Yn spanned 
by j = 1,2,. . . ,n. Analogously to Example 2, one can see that f/„ have the 
same distribution as random variables T„t/ where where : L^(T^) — > L^(T^) is 
the linear operator 

n 

i=i 

where is any orthonormal sequence in H^{T^) consisting of elements of 

if^(T^). Thus Un are proper linear discretizations of U in L^(T^). 

Let us next consider G if~^(T^). Due to the formula (ff5l) . the (if^(T^) x 
iJ^^(T^))-duality ([/„(u;),0) defines a real- valued Gaussian random variable with 
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covariance 

n 

(77) E((t/„,0)2) = ¥.{Y.h-X,hlX,{r^l<P){r^t<P)) 

j,k=i 

n 

= ||Q„(J-A)-Vll?,i(Ti)- 
The kernel of the covariance of operator of U in L^(T^), that is, the function 

G(t,t') =E(f/(t)t/(t')), t,t'eT^ 
is Green's function of the differential operator — ^ + 1, 

{-^,+l)G{tX) = Kt-t'). 

This implies that E (t/(t)f/(t')) is Lipschitz smooth on x T^. As U is Gaussian, 
one can see using e.g. Theorem 3.23] that the values of the Gaussian smoothness 
prior U are almost surely in any Holder space C°(T^) with a < 1/2. Thus, after 
fixing a G (0,|), we can consider U also as a Gaussian C"(T^)-valued random 
variable satisfying 

(78) E((f/,0)^) = ||(/-A)-V||^i(Ti) 

for every (f) in the dual space of C^iT^). Note that as H^{T^) C C"(T^), we have 
{C°'{T^)y C H^^(T^). Thus, since the projectors Q„ converge strongly to identity 
in H^(T^) as n ^ oo, we can use ( |77j) . ( |78l) . and the fact that the ranges of the 
operator T!„ used above are in H^{T^) C C"(T^), and infer that that Un are proper 
linear discretizations of U in C"(T^), too. Because of this the measurements M*^^^ 
and M(2) are well defined and one can see that the measurements Mn^ and Mn^ 
converge in distribution to M^^^ and M^^\ respectively, as ^ oo. 

Example 4. ( "Discrete total variation priors" ) Let us next consider an example 
on interval / = [0, 1] (i.e., the end points are not identified) and let 9j- : [0, 1] — > M 
be continuous functions with are affine functions on intervals I{n,j), j = 1, . . . ,n, 
vanishing at t = and t = 1 such that 6'", j = 1, 2, . . . , n — 1 are orthonormal in 
H\I). Let 

n-l 

(79) t^n(t) = $^^>;(t). 



37 



where = {Z" , Z2 , . . . , -^"^i) is a M" valued random variable having the probabil- 
ity density function 

Trzn{zo,...,Zn-i) = c„exp ^^^/"(t)) 

where a„ > is a parameter and normalization constant of the probability 

density function. The distribution of the random variables Un are sometimes called 
the discrete total variation prior. By [39], the random variables f/„ converge in 
distribution if a„ = n^/^ but then the limit f/ is a Gaussian random variable. As a 
Gaussian distribution stays Gaussian in a linear transformation, we see that in this 
example f/„ are not proper linear discretizations of U in any Banach space. 

Example 5. ( "Discretization of the two-dimensional Gaussian smoothness prior" ) 
Let us consider also higher dimensional example analogous to Example 3. Let 
L(n, j, k, 1), L{n,j, k, 2) C be disjoint triangles so that their union is the square 
/(n, j) X I{n, k) C T^, j, = 1, . . . , n. Let Clk,i : ^ M, j, A; = 1 . . . , n, / = 1, 2 be 
continuous functions on which are affine functions on triangles L{n, j, k,l) such 
that j, k = 1,2, . . . ,n, I = 1,2 are orthonormal in H^iT"^). Let 

n 2 

(80) [/„(ti,t2) = H^lkiXUlkiitiM). e r 

j,k=l 1=1 

where X^i^i ~ A^(0, 1) are independent and 6";.; > are parameters. Let 6"^; = 1. 
Then the Un can be considered as Gaussian random variables in H^^{T^) that 
converge weakly in distribution to a /7~^(T^) valued Gaussian random variable U 
having zero expectation and the covariance operator (/ — A)~^ in H~^{T^). This 
means that U is the two-dimensional Gaussian smoothness prior. 
Let now u G L^(T^). We consider two measurements 



u 



I u{t)dt, A(2)M = M(i,i) 



and define models M^^\ M^^) and M^^\ M^^) as in flTTl) where £ is Gaussian white 
noise in L^(T^). Then, the measurement M'-^-* is well defined and the measurements 
converge in distribution to M*^^^ . However, one can show that ~ A^(0, al), 
where (T„ ^ 00 as ^ 00 and we see that the measurements MA do not converge 
in distribution as n ^ 00. In this example one can verify that the f/„ are proper 
linear discretizations of U in H^^iT'^). 

f2) 

The fact the point value measurements Mn do not converge is related to the fact 
that the two-dimensional Gaussian smoothness prior has, formally speaking, the 
covariance function E {U{t)U{s)) = G{t, s), t, s G is the Green's function of the 
operator — A -|- / and has thus on the diagonal t = s a. logarithmic singularity. This 
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fact is extensively used in quantum field theory in the study of the free Gaussian 
field, a random field that is very similar to the Gaussian smoothness prior [58] . 

Appendix C. On the domain of reconstructors 

Considering formula ( 1271) in the infinite-dimensional case, we meet the difficulty 
that realizations of M belong to Z only with probability zero. Therefore the function 
m 1-^ TZM{U\'m) should be defined in some larger set than Z. 

A generalized definition of a reconstructor is as follows: 

Definition 22. The deterministic function TZm{U\ ■ ) : Sq ^ Y , where m 
7lM{U\m), defined in a Borel-measurahle subspace So C S is reconstructor of U 
with measurement M if M{uj) G Sq almost surely and 

(81) E{U\M){u) = nM{U\M{uj)) almost surely. 

The quantity E{g{U) \ ■ ) : Sq ^ Y is defined analogously. 

For the domain of TZM{U\m) we could consider any of the non-trivial Borel- 
measurable subspaces L G S, such that the realization M{uj) belongs to L almost 
surely. Given any two such subspaces Li and L2, the value of the function 7lM{U\m) 
can be changed in Li \ L2, without contradicting the property ( ISTl) . It is tempting 
to choose the domain to be the intersection of all such subspaces L G S, but un- 
fortunately, this intersection is the set Z where the realizations of M lie only with 
probability zero. It appears to be hard to pick a candidate for a smallest space Sq 
where the reconstructor 7lM{U\m) should be defined. 

Because of the above difficulties, we restricted ourselves to the case where the 
operator A maps A : S ^ implying that m i— TZM{U\m) can be defined in the 
whole space S. 
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